{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center>\n",
    "    <p style=\"text-align:center\">\n",
    "        <img alt=\"phoenix logo\" src=\"https://storage.googleapis.com/arize-phoenix-assets/assets/phoenix-logo-light.svg\" width=\"200\"/>\n",
    "        <br>\n",
    "        <a href=\"https://arize.com/docs/phoenix/\">Docs</a>\n",
    "        |\n",
    "        <a href=\"https://github.com/Arize-ai/phoenix\">GitHub</a>\n",
    "        |\n",
    "        <a href=\"https://arize-ai.slack.com/join/shared_invite/zt-2w57bhem8-hq24MB6u7yE_ZF_ilOYSBw#/shared-invite/email\">Community</a>\n",
    "    </p>\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Instrumenting AWS Bedrock client with OpenInference and Phoenix\n",
    "\n",
    "In this tutorial we will trace model calls to AWS Bedrock using OpenInference. The OpenInference Bedrock tracer instruments the Python `boto3` library, so all `invoke_model` calls will automatically generate traces that can be sent to Phoenix.\n",
    "\n",
    "ℹ️ This notebook requires a valid AWS configuration and access to AWS Bedrock and the `claude-v2` model from Anthropic & an OpenAI API key for LLM as a Judge Evaluation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Install dependencies and set up OpenTelemetry tracer\n",
    "\n",
    "First install dependencies"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install arize-phoenix boto3 openinference-instrumentation-bedrock"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Import libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following env variables will allow you to connect to a cloud instance of Arize Phoenix. You can get your API key and endpoint on setting page of your [Phoenix Cloud](https://app.phoenix.arize.com) account.\n",
    "\n",
    "If you'd prefer to self-host Phoenix, please see [instructions for self-hosting](https://arize.com/docs/phoenix/deployment). The Cloud and Self-hosted versions are functionally identical."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "from getpass import getpass\n",
    "\n",
    "import boto3\n",
    "from openinference.instrumentation.bedrock import BedrockInstrumentor\n",
    "\n",
    "import phoenix as px\n",
    "from phoenix.otel import SimpleSpanProcessor, register\n",
    "\n",
    "if not (phoenix_endpoint := os.getenv(\"PHOENIX_COLLECTOR_ENDPOINT\")):\n",
    "    phoenix_endpoint = getpass(\"🔑 Enter your Phoenix Collector Endpoint: \")\n",
    "os.environ[\"PHOENIX_COLLECTOR_ENDPOINT\"] = phoenix_endpoint\n",
    "\n",
    "if not (phoenix_api_key := os.getenv(\"PHOENIX_API_KEY\")):\n",
    "    phoenix_api_key = getpass(\"🔑 Enter your Phoenix API key: \")\n",
    "os.environ[\"PHOENIX_API_KEY\"] = phoenix_api_key"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here we're configuring the OpenTelemetry tracer by adding two SpanProcessors. The first SpanProcessor will simply print all traces received from OpenInference instrumentation to the console. The second will export traces to Phoenix so they can be collected and viewed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tracer_provider = register(project_name=\"bedrock-tracing-and-evals-tutorial\", auto_instrument=True)\n",
    "tracer_provider.add_span_processor(\n",
    "    SimpleSpanProcessor(endpoint=os.getenv(\"PHOENIX_COLLECTOR_ENDPOINT\"))\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Alternative: Prompt for AWS credentials if not already set\n",
    "# Uncomment the code below to be prompted for credentials interactively\n",
    "\n",
    "if not os.getenv(\"AWS_ACCESS_KEY_ID\"):\n",
    "    os.environ[\"AWS_ACCESS_KEY_ID\"] = getpass(\"🔑 Enter your AWS Access Key ID: \")\n",
    "if not os.getenv(\"AWS_SECRET_ACCESS_KEY\"):\n",
    "    os.environ[\"AWS_SECRET_ACCESS_KEY\"] = getpass(\"🔑 Enter your AWS Secret Access Key: \")\n",
    "# if not os.getenv(\"AWS_DEFAULT_REGION\"):\n",
    "#     os.environ[\"AWS_DEFAULT_REGION\"] = input(\"🌍 Enter your AWS Region (default: us-west-2): \") or \"us-west-2\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Instrumenting Bedrock clients\n",
    "\n",
    "Now, let's create a `boto3` session. This initiates a configured environment for interacting with AWS services. If you haven't yet configured `boto3` to use your credentials, please refer to the [official documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html). Or, if you have the AWS CLI, run `aws configure` from your terminal."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "session = boto3.session.Session()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Clients created using this session configuration are currently uninstrumented. We'll make one for comparison."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "uninstrumented_client = session.client(\"bedrock-runtime\", region_name=\"us-west-2\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we instrument Bedrock with our OpenInference instrumentor. All Bedrock clients created after this call will automatically produce traces when calling `invoke_model`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "BedrockInstrumentor().instrument(skip_dep_check=True)\n",
    "instrumented_client = session.client(\"bedrock-runtime\", region_name=\"us-west-2\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Calling the LLM and viewing OpenInference traces\n",
    "\n",
    "Calling `invoke_model` using the `uninstrumented_client` will produce no traces, but will show the output from the LLM."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "prompt = b\"\"\"{\"prompt\": \"Human: What is the 3rd month of the year in alphabetical order? Assistant:\", \"max_tokens_to_sample\": 1024}\"\"\"\n",
    "response = uninstrumented_client.invoke_model(modelId=\"anthropic.claude-v2:1\", body=prompt)\n",
    "response_body = json.loads(response.get(\"body\").read())\n",
    "print(response_body[\"completion\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "LLM calls using the `instrumented_client` will print traces to the console! By configuring the `SpanProcessor` to export to a different OpenTelemetry collector, your OpenInference spans can be collected and analyzed to better understand the behavior of your LLM application."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "response = instrumented_client.invoke_model(modelId=\"anthropic.claude-v2:1\", body=prompt)\n",
    "response_body = json.loads(response.get(\"body\").read())\n",
    "print(response_body[\"completion\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Collect all your Traces & Data\n",
    "\n",
    "Use the `instrumented_client` to collect all your traces; This example uses a set of trivia questions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "trivia_questions = [\n",
    "    \"What is the only U.S. state that starts with two vowels?\",\n",
    "    \"What is the 3rd month of the year in alphabetical order?\",\n",
    "    \"What is the capital of Mongolia?\",\n",
    "    \"How many minutes are there in a leap year?\",\n",
    "    \"If a train leaves New York at 3 PM traveling west at 60 mph, and another leaves Chicago at 4 PM traveling east at 80 mph, at what time will they meet?\",\n",
    "    \"Which element has the chemical symbol 'Fe'?\",\n",
    "    \"What five-letter word becomes shorter when you add two letters to it?\",\n",
    "    \"What country has won the most FIFA World Cups?\",\n",
    "    \"If today is Wednesday, what day of the week will it be 100 days from now?\",\n",
    "    \"A farmer has 17 sheep and all but 9 run away. How many does he have left?\",\n",
    "]\n",
    "\n",
    "for i, question in enumerate(trivia_questions, start=1):\n",
    "    prompt_str = f\"\"\"\n",
    "{{\n",
    "    \"prompt\": \"Human: {question} Assistant:\",\n",
    "    \"max_tokens_to_sample\": 300\n",
    "}}\n",
    "\"\"\"\n",
    "    response = instrumented_client.invoke_model(\n",
    "        modelId=\"anthropic.claude-v2:1\",\n",
    "        body=prompt_str.encode(\"utf-8\"),\n",
    "        contentType=\"application/json\",\n",
    "        accept=\"application/json\",\n",
    "    )\n",
    "\n",
    "    response_body = json.loads(response.get(\"body\").read())\n",
    "    print(f\"Q{i}: {question}\")\n",
    "    print(f\"A{i}: {response_body['completion'].strip()}\\n{'-' * 60}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Setup & Run your Eval\n",
    "\n",
    "After importing your traces as a dataframe, modify your columns to fit into your eval template."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "qa_template = \"\"\"You are given a question and an answer. You must determine whether the\n",
    "given answer correctly answers the question. Here is the data:\n",
    "    [BEGIN DATA]\n",
    "    ************\n",
    "    [Question]: {Question}\n",
    "    ************\n",
    "    [Answer]: {Answer}\n",
    "    [END DATA]\n",
    "Your response must be a single word, either \"correct\" or \"incorrect\",\n",
    "and should not contain any text or characters aside from that word.\n",
    "\"correct\" means that the question is correctly and fully answered by the answer.\n",
    "\"incorrect\" means that the question is not correctly or only partially answered by the\n",
    "answer.\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "spans_df = px.Client().get_spans_dataframe(project_name=\"bedrock-tracing-and-evals-tutorial\")\n",
    "spans_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "eval_df = spans_df[[\"context.span_id\", \"attributes.input.value\", \"attributes.output.value\"]].copy()\n",
    "eval_df.set_index(\"context.span_id\", inplace=True)\n",
    "eval_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "evals_copy = eval_df.copy()\n",
    "evals_copy[\"attributes.input.value\"] = (\n",
    "    evals_copy[\"attributes.input.value\"]\n",
    "    .str.replace(r\"^Human: \", \"\", regex=True)\n",
    "    .str.replace(r\"Assistant:$\", \"\", regex=True)\n",
    ")\n",
    "\n",
    "evals_copy = evals_copy.rename(\n",
    "    columns={\"attributes.input.value\": \"Question\", \"attributes.output.value\": \"Answer\"}\n",
    ")\n",
    "evals_copy.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from openinference.instrumentation import suppress_tracing\n",
    "\n",
    "from phoenix.evals import (\n",
    "    async_evaluate_dataframe,\n",
    "    create_classifier,\n",
    ")\n",
    "from phoenix.evals.llm import LLM\n",
    "\n",
    "llm = LLM(provider=\"openai\", model=\"gpt-4\")\n",
    "\n",
    "qa_correctness_eval = create_classifier(\n",
    "    name=\"qa_correctness\",\n",
    "    prompt_template=qa_template,\n",
    "    llm=llm,\n",
    "    choices={\"correct\": 1.0, \"incorrect\": 0.0},\n",
    ")\n",
    "with suppress_tracing():\n",
    "    Q_and_A_classifications = await async_evaluate_dataframe(\n",
    "        dataframe=evals_copy, evaluators=[qa_correctness_eval]\n",
    "    )\n",
    "Q_and_A_classifications.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Log your traces into Phoenix"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from phoenix.evals.utils import to_annotation_dataframe\n",
    "\n",
    "relevancy_eval_df = to_annotation_dataframe(dataframe=Q_and_A_classifications)\n",
    "\n",
    "await px.Client().spans.log_span_annotations_dataframe(\n",
    "    dataframe=relevancy_eval_df,\n",
    "    annotator_kind=\"LLM\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![GIF showcasing what the phoenix UI will look like with all the traces & eval](https://storage.googleapis.com/arize-phoenix-assets/assets/gifs/bedrock_tracing_eval_medium.gif)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "More information about our instrumentation integrations, OpenInference can be found in our [documentation](https://arize.com/docs/phoenix/references/openinference)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
